Skip to content
This repository was archived by the owner on Oct 6, 2025. It is now read-only.

Conversation

@doringeman
Copy link
Collaborator

Add requests monitoring. It uses docker/model-runner#157.

In docker/model-runner#157:

MODEL_RUNNER_PORT=8080 make run

Here:

$ make install

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model requests --help
Usage:  docker model requests [OPTIONS]

Fetch requests+responses from Docker Model Runner

Options:
  -f, --follow             Follow requests stream
      --include-existing   Include existing requests when starting to follow (only available with --follow)
      --model string       Specify the model to filter requests

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model requests --include-existing -f --model ai/qwen3:0.6B-Q4_0
Connected to request stream. Press Ctrl+C to stop.
{"id":"sha256:df9f2a333a636ca3a290700759adc29fad54cb2dee5b2f198e1bce26101686eb_1757684269324900000","model":"ai/qwen3:0.6B-Q4_0","method":"POST","url":"/engines/v1/chat/completions","request":"{\"model\":\"ai/qwen3:0.6B-Q4_0\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"stream\":true}","response":"{\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"Hi! How can I assist you today? 😊\",\"reasoning_content\":\"okay, the user just said \\\"hi\\\". I should respond politely. Let me make sure to acknowledge their greeting.\\n\\nFirst, a simple \\\"hi\\\" is good. Then, maybe add something like \\\"Hello! How can I help you today?\\\" That gives a friendly response and opens up the conversation.\\n\\nI should keep it short and positive. Don't be too formal. Something like \\\"Hi! How can I assist you today?\\\" sounds natural and helpful.\\n\\nLet me check if there's any other way to respond. Maybe add a question to engage them further. Like, \\\"Are there any specific topics you'd like to discuss?\\\" That encourages interaction.\\n\\nYes, that works. So the response should be short, friendly, and open for further conversation.\",\"role\":\"assistant\"}}],\"created\":1757684270,\"id\":\"chatcmpl-cEC7vKkmKfxQEXA4Gr6hM0MlXefr5GGs\",\"model\":\"ai/qwen3:0.6B-Q4_0\",\"object\":\"chat.completion\",\"system_fingerprint\":\"b1-c610b6c\",\"timings\":{\"predicted_ms\":981.252,\"predicted_n\":166,\"predicted_per_second\":169.17162971387575,\"predicted_per_token_ms\":5.911156626506024,\"prompt_ms\":33.441,\"prompt_n\":9,\"prompt_per_second\":269.1307078137615,\"prompt_per_token_ms\":3.715666666666667},\"usage\":{\"completion_tokens\":166,\"prompt_tokens\":9,\"total_tokens\":175}}","timestamp":1757684269,"status_code":200,"user_agent":"docker-model-cli/dev"}

Note the completion added for --model.

TODO: Add --format.

Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I still agree with @ilopezluna that it'd be worth combining the two endpoints and differentiating on Accept if in docker/model-runner#157 if it's not too much work.

Differentiate regular and streaming based on the Accept Header.

Signed-off-by: Dorin Geman <[email protected]>
@doringeman
Copy link
Collaborator Author

Combined the endpoints as discussed here in e806cc4. Thanks!

@ericcurtin ericcurtin requested a review from Copilot September 15, 2025 14:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new requests command to the Docker Model CLI that fetches requests and responses from the Docker Model Runner, enabling monitoring of model inference activities. The implementation includes both one-shot and streaming modes with optional model filtering.

  • Adds docker model requests command with streaming and filtering capabilities
  • Updates dependencies to include model-runner support for request monitoring
  • Provides comprehensive documentation and CLI completion for the new feature

Reviewed Changes

Copilot reviewed 8 out of 89 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
go.mod Updates model-runner and model-distribution dependencies to support request monitoring
docs/reference/model_requests.md Adds documentation for the new requests command
docs/reference/model.md Updates parent command documentation to include requests subcommand
docs/reference/docker_model_requests.yaml Defines CLI specification for requests command options
docs/reference/docker_model.yaml Updates parent command specification to include requests
desktop/desktop.go Implements HTTP client method for fetching requests with streaming support
commands/root.go Registers the new requests command with the CLI
commands/requests.go Implements the requests command with streaming, filtering, and completion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@doringeman doringeman merged commit f064505 into docker:main Sep 17, 2025
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants